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Developing practical achievement tests for use at the primary-grade level is a 
difficult task. Some problems encountered appear to be resolved by using verbally 
administered yes-no tests. But such tests, are criticized as having a low reliability 
because they offer only two choices. Two modifications of the yes-no test have been 
proposed to increase reliability. One is the 'matched-pairs’ technique, in which every 
'yes’ item has a matching item to be answered 'no'. Both items must be answered 
correctly for either to be counted. The second technique of the all-no test, an 
attempt to counter the children’s proclivity to answer 'yes’ even when the answer is 
not known. Some 200 first grade children were administered an economics test, in 
which all three techniques were used. The test scores indicated that the all-no test 
had the greatest reliability, but it was less valid than the matched-pairs test. Thus, 
the matcned-pairs test would be the best way to construct the yes-no type of 
achievement tests. Another article by the same authors (see PS 001 819) also deals 
with the subject. (WD) 
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PROBLEM 

Developing practical achievement tests for use at the primary-grade 
level is a difficult task for teachers or researchers. Written multiple- 
choice tests require reading ability on the part of the child. Multiple- 
choice picture tests require too much time to construct. Interviews require 
too much time to administer, and they are difficult to standardize. 

These problems could be surmounted by using verbally administered 
YES-NO tests.* Such tests, however, have low reliability because they offer 
the student only two options on each item. They also are difficult to inter- 
pret because they are sensitive to acquiescence response set (Shaver and 
Larkins, 1966; Larkins and Shaver, 1967). 

Larkins and Shaver (1967) reported an attempt to correct for response 
act and increase the reliability of the YES-NO test. They produced a YES-NO 
Primary Grades riconomics Test (called PET-1) which was written with reversals 
and scored using matched-pairs. That is, for every item for which the correct 
response was YES (YES items), a matching item was written for which the correct 
response was NO 0:0 items); and the student x*as required to correctly respond 
to both the YES item and the NO item in a matched-pair before receiving 
credit for either item. 

It was expected that this scoring technique would increase the reli- 
ability of the YES-NO test by decreasing the probability of a correct chance 
response from one-in- two to one-in-four. It was also expected that this 
technique would correct for response set. If the student tended to guess 



After the teacher reads an item, the students mark either YES or NO 
on their printed answer sheets. 

2 

Acquiescence response set is the tendency to respond YES when not 
responding from knowledge. In this paper, the tendency to respond NO is 
called dissent response set. 
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YES, he would miss the NO items; if he tended to guess NO, he would miss the 
YES items. 

The PET-1 test was administered to experimental and control groups of 

first-grade children--students ./ho had been instructed in economic concepts, 

and students who had not. The test was then scored in both the ordinary 

manner and using the Matched-Pairs technique. In both the experimental 

group and in the control group, Matched-Pairs scores were more reliable 

3 

than ordinary YES-NO scores. 

Although Matched-Pairs scoring was successful in increasing reliability, 
the corrected Matched-Pairs split-half reliability coefficient of ,60 was 
barely adequate for comparing group means and fell short of the .90 usually 
considered desirable for discriminating between individual scores. Even 
though reliability might have been increased by increasing the length of the 
test, other approaches to improving the YES-NO test were also investigated 
Cronbach (1942) suggested that the reliability of the YES-NO tests could be 
increased by writing tests containing only NO items. Since most people tend 
to acquiesce rather than dissent, a NO response would more frequently be made 
from knowledge than would a YES response. However, NO-item-only tests (A11-N0 
tests) probably favor the dissenter --a person who tends to respond NO when 
not responding from knowledge would obtain a spuriously high score on such 
an achievement test. 

No comparison of the reliability and validity of YES-NO, Matched-Pairs, 



The corrected split-half reliability coefficient for scores based on 
thirty YES-NO items was .35 in the experimental group. In the same group, 
the corrected coefficient for scores based on fifteen matched-pairs of items 
was .60. Coefficients for YES-NO and Matched-Pairs scores in the control 
group were .14 and .46. Split-half reliability coefficients as high as .85 
have been obtained with Matched-Pairs scoring on longer tests. 
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and All- NO test scores was available in the literature, although this informa- 
tion would be important to researchers and teachers when selecting test 
formats. 



OBJECTIVES 

The objectives of this study were to: 

1, Devise a first-grade economics achievement test or tests 
which would yield YES-NO, Matched-Pairs, and A11-N0 scores, 
and 

2, Determine whether these scores differed in reliability and 
validity, 

EXPECTATIONS 

Based on previous experience with YES-NO and Matched-Pairs tests, 

Cronbach's advice concerning YES-NO and A11-N0 tests, and limited experience 

with one A11-N0 test, it was expected that in reliability the tests would 

rank: A11-N0, Matched-Pairs, and YES-NO, x*ith A11-N0 the highest. 

Expectations concerning validity were based on the a priori argument 

that A11-N0 scores for highly acquiescent students would be lower than for 

less acquiescent but equally knowledgeable students. That is, that All-NO 

scores would confound response set and knowledge. It was expected that if 

this confounding were serious, certain predictions based on the construct 
4 

of knowledge would not be confirmed with the All-NO test. Those predictions 
were : 

1. PET-1 achievement scores for knowledgeable groups will be more 
reliable than for ignorant groups. 



These predictions were based on the assumption that students who 
received an experimental treatment would be knowledgeable compared to 
students who did not receive the experimental treatment. As reported 
elsewhere, this assumption appears to have been sound (Larkins, 1968). 
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2. PET-1 achievement scores for knowledgeable groups will be more 
variable than for ignorant groups. 

3. PET-1 means for knowledgeable groups will be larger at the .01 
level of significance than for ignorant groups. 



PROT-DURE 

Two Primary Economics Tests were written based on Our Working World : 
Families at Wo r k (Sencsh, 1963). The first was a 74 item, A11-N0 test. The 
second was a similar YES- NO test, written with reversals so that it could be 
scored using the Matched-Pairs technique. These tests were administered, 
as part of a larger study (Larkins, 1968), to two experimental groups and 
one control group. Students were selected as classroom units without random- 
ization. Three experimental classes and three control classes were selected 
from two school districts in northern Utah, Three experimental classes were 
selected from the Elkhart, Indiana, School District. Although students were 
not selected randomly. Tests of General Ability (Flanagan, 1959) scores were 
obtained and used to correct for initial differences in mental ability when 
PET-1 means were compared. 

Split-half reliability coefficients, corrected with the Spearman-Brown 
Prophecy Formula, were computed and compared for sets of YES-NO, Matched- 
Pairs, and A11-N0 scores. This wan done to test the expectation concerning 
reliability, and to test the first prediction related to validity. This was 
the only part of the analysis in which the 74 item, A11-N0 test was used. 
When means, standard deviations, F-tests, and t-tests were computed to test 
the second and third predictions related to validity, it was necessary to 
hold constant all other factors except test form. This was accomplished by 
obtaining YES- NO, Matched-Pairs, and A11-N0 scores from a single administra- 
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tion of the YES-NO test. The A11-N0 scores for this part of the analysis 
were obtained, therefore, by using only the NO items on the YES-NO test. 
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FINDINGS 

Findings presented in Table 1 support the expectation concerning 
reliability. As predicted, in reliability the three sets of scores ranked: 
A11-N0, Matched-Pairs, and YES-NO. 



Table 1. Split-half reliability coefficients for YES-NO, Matched-Pairs, 
and All-NC scores. 
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Group 


N 8 


YES-NO 
75 Items b 


YES-NO 

Matched-Pairs 
37 Pairs 


A11-N0 
74 Items 


1 


77 


.68 


.85 


.90 


2 


59 


.48 


.66 


.89 


3 


77 


.29 


.62 


.87 



The number of students in the group. 
The number of items on the test. 



It was expected that certain predictions based on the construct of 
knowledge would not be confirmed with the A11-N0 test. Three groups were 
available with which to test these predictions. In Table 1, Group 1 is an 
experimental group which was taught economic concepts under optimal con- 
ditions. The teachers had both special training and previous experience 
in using Families at Work , and the students’ mean score on the Tests of 
General Ability was close to one grade level above their grade at the time 
of testing. Group 2 is also an experimental group, but was taught economics 
under more nearly average conditions. The teachers had neither special train- 
ing nor experience in using Families at Work , and the group’s mean score on 
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the mental abilities test was not above grade level. Group 3 is the control 
group. These children received no instruction x/ith the Families at Work 



program. On three sets of PET-1 scores, the groups ranked 1, 2, and 3 in 
knowledge of economic concepts (see Table 2). 

The first prediction related to validity x^as that PET-1 scores for knowledge- 
able groups would be more reliable than for less knowledgeable groups. It 
can be seen in Table 1 that the reliability coefficient for the All-No test 



is nearly as large for the least knowledgeable group as it is for the most 
knowledgeable group. This was not true of the YES-NO or Matched-Pairs tests. 
Because the AI1-A) test produced such stable reliability coefficients, its 
validity must be questioned. A two-option test which produces reliability 
coefficients which do not vary from knowledgeable to ignorant groups is pro- 
bably testing something other than, or in addition to, knowledge. 

The second prediction based on the construct of knowledge was that 
FET-1 scores for knowledgeable groups x*ould be more variable than for less 
knowledgeable groups. Findings presented in Table 2 indicate that this 



prediction x*as confirmed x*ith the YES-NO scores and the Matched-Pairs 
scores, but not with the A11-N0 scores. 5 



The difference between standard deviations for Groups 1 and 3, 
checked using the variance ratio, was significant at the .05 level for 
the YES-NO test, and at the .01 level for the Matched-Pairs test. Since 
the predictions were directional, a one tailed, test of significance was 
used . 






am 




-7- 

T ?ble 2. Means and standard deviations for YES-NO, Matched-Pairs, and A11-N0 
scores derived from a single administration of the YES-NO test. 
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Group 


h M 


%SD 


M c SD 3 


M 


SD 


1 


77 


27.75 


3.76 


20.46 6.58 


25.16 


6.31 


2 


59 


24.15 


3.38 


15.14 5.41 


20.05 


6.58 


3 


46 


23.30 


2.97 


13.65 4.78 


19.15 


6.46 



The number of students in the group, 
b 

The YES-NO test is twice as long as the others. In order to make a direct 
comparison, its means and standard deviations were reduced by half. 
c The mean. 

a The standard deviation. 



Itoo observations are of particular interest in regards to Table 2, 

1. For all three groups, the YES-NO test tends to be less variable than 



either of the other tests— its standard deviations are smaller. 6 One explana- 
tion might be that, since students tend to be acquiescent, YES items obscure 
differences between ignor-nt and knowledgeable students. Both respond YES; 
one from knowledge, the other from response set. The reduction in variability 



j among students on the YES items may reduce the standard deviation for the 

| 

| total test. 

I 

I 2. In Groups 2 and 3— the least knowledgeable groups — the standard 

I _ 

I deviations for the A11-N0 test are larger than for the Matched-Pairs test/ 






I & 

\ Th e differences between the standard deviations for the YES-NO test 

| and each of the other two tests were significant at the .01 level in each 

| instance. However, the computation did not take into account the correlation 

[ of the test scores which is caused by using a single group. The significance 

of the difference between standard deviations is, therefore, probably even 
! higher. 



The difference between standard deviations was significant at the .05 
level for Group 2, and at the .01 level for Group 3. However, since the 
form-.’ la for correlated scores was not used, the significance of the difference 
between standard deviations is probably even higher. 
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Furthermore, the standard deviations for the A11-N0 test are similar in 
all three groups, but the standard deviations for the Matched-Pairs and 
the YES-NO test decrease from Groups 1 to 3, 

The A11-N0 and YES-NO standard deviations are probably spurious if taken 
as indicators of variability in knowledge. Scores on the YES-NO test apparently 

are less variable in all groups than they would be if the instrument were not 
measuring dissent in addition to knowledge. 

The third prediction based on the construct of knowledge was the PET-1 



means for knowledgeable groups would be larger at the .01 level of signifi- 
cance than for ignorant groups. As indicated by the standard deviations in ! 

Table 2, the variability of YES-NO and A11-N0 scores is affected by response 
set as well as knowledge. Since parametric tests of significance utilize 
sample variance, i.e., the standard deviation, to estimate population j 

variance, it is possible that when acquiescence is confounded with knowledge, 
groups might appear to differ in knowledge when they do not, or groups j 

might appear not to differ in knowledge when they do differ. I 

The significance of the difference among PET-1 means for the three j 

groups was tested using analysis of covariance, with adjustments for initial ' 

differences in mental ability. 8 The significance of the differences between 
PET-1 means for pairs of groups was then tested using the t-test. Table 3 j 

presents the t-ratios for the three PET-1 tests and the three groups. j 



F-ratios comparing all three 
and All- NO tests were 19.34, 16.03, 



groups on the YES-NO, Matched-Pairs 
and 9.16. F * 4.71. 

.01 
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Table 3. T- ratios between PET-1 means adjusted for initial differences in 
mental ability. 



Groups 


Adjusted 

YES-NO 


Adjusted 

Matched-Pairs 


Adjusted 

A11-N0 


1 and 2 


1.96 


1.10 


.54 


2 and 3 


2.53 


2.81 


2.32 


1 and 3 


4.53 


4.00 


3,00 



‘. 01 s2 - 61 



Since Groups 1 and 2 received the experimental treatment, it was expected 
that their PET-1 means would not differ. Since Group 3 did not receive the 
experimental treatment, it was expected that its PET-1 means would differ 
from those for both Group 1 and Group 2. The findings in Table 3 indicate 
that Groups 1 and 3 differed at the .01 level of significance for all three 
PET-1 tests, as expected. However, results with the YES-NO test very nearly 
failed to confirm the expectation that the PET-1 means for Groups 1 and 2 
would not differ— the t-ratio is nearly significant at the .05 level. And, 
both the YES-NO test and the A11-N0 test failed to confirm the prediction 
that PET-1 means for Groups 2 and 3 would differ at the .01 level of signifi- 
cance. In other words, the Matched-Pairs test was the only one to uniformly 
confirm the prediction that PET-1 means for knowledgeable groups of students 
would significantly differ from PET-1 means for less knowledgeable groups 
of students. 

SUMMARY 

Larkins and Shaver (1967) reported that the validity and reliability of 
the YES-NO type test could be improved by writing items with reversals and 
scoring the test with matched-pairs. Cronbach (1942) recommended that the 
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reliability of the YES-NO type test could be improved by including only NO 
items. However, on a priori grounds it appeared that the A11-N0 test would 
produce spuriously high achievement scores for students who were not 
acquiescent. Furthermore, no direct comparison had been made between 
Matched-Fairs tests and A11-N0 tests for validity and reliability. 

In the present study, YES-NO, Matched-Pairs, and A11-N0 scores were 
compared. It was concluded that the A11-N0 test had greater reliability than 
the YES-NO or Matched-Pairs tests. It was also concluded, however, that the 
A11-N0 test was less valid than the Matched-Pairs test. This conclusion 
was based on the lack of confirmation, with the All-NO test, of three pre- 
dictions based on the construct of knowledge. One of these predictions was 
also not confirmed with the YES- NO test. 

Based on these results, researchers and primary-grade teachers would be 
well-advised to use Matched-Pairs scoring when writing YES-NO type achieve- 
ment tests. $ 
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Research utilizing several Matched-Pairs tests (Larkins, 1968) has 
indicated that to obtain adequate reliability for establishing grades for 
individual primary grade students, at least 60 pairs of items will frequently 
be needed . 
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